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Abstract. We prove that Nakhleh's latest 'metric' for phylogenetic networks is a 
metric on the classes of tree-child phylogenetic networks, of semi-binary time consis- 
tent tree-sibling phylogenetic networks, and of multi-labeled phylogenetic trees. We 
also prove that it separates distinguishable phylogenetic networks. In this way, it 
becomes the strongest dissimilarity measure for phylogenetic networks available so 
far. 



1 Introduction 

Evolutionary networks are explicit models of evolutionary histories that include reticulate 
evolutionary events like genetic recombinations, lateral gene transfers or hybridizations. 
There are currently many algorithms and software tools that make it possible to recon- 
struct evolutionary networks. As in the classical phylogenetic tree reconstruction setting, 
the assessment of evolutionary network reconstruction methods requires the ability to 
compare phylogenetic networks; for instance, to compare inferred networks with either 
simulated networks or true phylogenies, and to evaluate the robustness of phylogenetic 
network reconstruction algorithms when adding new species [11,15]. This has led to an 
increasing interest in the defintion of dissimilarity measures for the comparison of evolu- 
tionary networks, and their implementation in software packages. 

These dissimilarity measures include the bipartitions, or Robinson- Foulds, metric [1], 
which satisfies the axioms of metrics [9] on the classes of regular networks [1] and of tree- 
child time consistent phylogenetic networks [8]; the tripartitions metric [11], which satisfies 
the axioms of metrics on the class of tree-child time consistent phylogenetic networks [8] ; 
the ^-distance [7], which is a metric on the classes of all tree-child phylogenetic networks 
[7] and of semi-binary tree-sibling time consistent phylogenetic networks [3]; the triplets 
metric^ which is a metric on the class of tree-child time consistent phylogenetic networks 
[4]; and a nodal metric that is again a metric on the class of tree-child time consistent 
phylogenetic networks [4,5]. 

L. Nakhleh has recently proposed a dissimilarity measure for the comparison of phylo- 
genetic networks [12] and he has proved that it satisfies the separation axiom of metrics 
(zero distance means isomorphism) on the class of all reduced phylogenetic networks in 
the sense of [11], and hence that it is a metric on this class of networks. In this note 
(which should be seen as a sequel of our previous technical report [6]) we complement 
and generalize Nakhleh's work in two directions. On the one hand, we prove a stronger 
result: namely, that for this dissimilarity measure, zero distance implies indistinguisha- 
bility up to reduction in the sense of [11], a goal that had already been unsuccessfully 
pursued by Moret-Nakhleh-Warnow et al in Joe. cifc.. In this way, and to the best of our 
knowledge, Nakhleh's dissimilarity measure turns out to be the first one that separates 



distinguishable networks. On the other hand, we show that this dissimilarity measure is 
a metric on several classes of phylogenetic networks and related (multi-)labelled DAGs: 
namely, on the classes of tree-child phylogenetic networks, of semi-binary time consistent 
tree-sibling phylogenetic networks, and of multi-labelled phylogenetic trees. Adding this 
to the aforementioned fact, previously proved by Nakhleh, that it is a metric on the class 
of all reduced networks, it turns out that his latest dissimilarity measure for phylogenetic 
networks has the strongest separation power among all metrics defined so far. 

2 Preliminaries 

2.1 Notations 

Let N = {V,E) be a DAG (a finite directed acyclic graph). We say that a node v £ V is 
a child of M e F if (u, v) € E; we also say then that w is a parent of v. Two children of 
a same parent are said to be sibling of each other. A leaf is a node without children. A 
node that is not a leaf is called internal. We say that a node is a tree node when it has at 
most one parent, and that it is a hybrid node when it has more than one parent. A DAG 
is quasi-binary when all its hybrid nodes have exactly two parents and one child, without 
any restriction on the number of children of its tree nodes. A DAG is rooted when it has 
only one root: a node without parents. 

A path in N = {V, E) is a sequence of nodes (wq, vi, . . . , Vk) such that u^) £ E for 

all « = 1, . . . , fc. We call vq the origin of the path, v\, . . . , Vk-i its intermediate nodes, Vk 
its end, and k its length; a path is non-trivial when its length is larger than 0. We denote 
hy u-^v any path with origin u and end v and, whenever there exists a path u^v, we say 
that is a descendant of u and that u is an ancestor of v: if the path u-^v is non- trivial, 
we say that w is a proper descendant of u and that u is an proper ancestor of v. 

The height h{v) of a node w in a DAG N is the largest length of a path from v to a 
leaf. The absence of cycles implies that the nodes of a DAG can be stratified by means 
of their heights: the leaves are the nodes of height and, for every m > 1, the nodes of 
height m are those internal nodes with all their children of height smaller than m and at 
least one child of height exactly m — 1. 

Let 5 be a non-empty finite set, whose elements are called taxa or other Operational 
Taxonomic Units; unless otherwise stated, for simplicity we shall always take as S the set 
of positive integers {1, . . . , n}, with n= \S\. A phylogenetic network on a set S of taxa is 
a rooted DAG. whose leaves are bijectively labeled by elements of 5. A phylogenetic tree 
is a phylogenetic network without hybrid nodes. We shall always identify, usually without 
any further notice, each leaf of a phylogenetic network with its label. Two phylogenetic 
networks N, N' are isomorphic, in symbols N = N', when they are isomorphic as directed 
graphs and the isomorphism sends each leaf of TV to the leaf with the same label in TV'. 

A phylogenetic network N = {V, E) is said to be tree-child when every internal node 
has some child that is a tree node, tree-sibling when every hybrid node has some sibling 
that is a tree node, and time consistent when it allows a mapping 

such that, for every arc (u, v) e E, t{u) < t{v) if is a tree node and t(m) = t{v) if u is a 
hybrid node. The biological moaning of these conditions has been discussed in [2, 3, 7, 11]. 

For every node w of a phylogenetic network N = {y,E), let C{u) be the set of all its 
descendants in A'' and N{u) the subgraph of A'' supported on C{u): it is still a phylogenetic 



network, with root u and leaves labeled in the subset Cl(u) C 5* of labels of the leaves 
that are descendants of u. We shall call N{u) the rooted subnetwork of N generated by u, 
and the set of leaves Cl{u) the cluster of u. 

A clade of a phylogenetic network is a rooted subnetwork of N all whose nodes are 
tree nodes in (and, in particular, it is a rooted tree). 

Let 5* be again a finite set of labels and V^{S) the set of its non-empty subsets. A 
{rooted) multi-labeled phylogenetic tree (a MUL-tree, for short) over S* is a rooted tree 
whose leaves are labeled in V~^{S). In particular, two leaves in a MUL-tree may share one 
or more labels. More in general, a multi-labelled DAG over 5 is a DAG whose leaves are 
labeled in P+{S). 

2.2 Multisets and metrics 

Let C be a class endowed with a notion of isomorphism =; for instance, the class of all 
phylogenetic networks on a given set of taxa. A metric on C is a mapping 

d-.CxC^R 

satisfying the following axioms: for every A,B,C e C, 

(a) Non-negativity: d{A, B) ^ 

(b) Separation: d{A, B) = if and only ifA^B 

(c) Symmetry: d{A, B) = d{B, A) 

(d) Triangle inequality: d{A, C) < d{A, B) + d{B, C) 

A finite m,ultiset of elements of a set X is a mapping Ad : X such that its support 
{x & X \ M(x) ^ 0} is finite. If the support of a finite multiset M : X — > N is {xi, . . . , x^}, 
then this multiset can be understood as a (sort of) set consisting of M{xi) copies of Xj, 
for every i = 1,. . . ,k: in this context, M{x) is called the multiplicity of x € X in this 
multiset, and this multiplicity is when x does not belong to the support. 

The cardinal \M\ of a finite multiset M of elements of X is simply the sum of the 
multiplicities of the elements: 

|M| = ^ M(x). 
xex 

The symmetric difference of two finite multisets Mi , M2 of elements of a set X is the 
finite multiset 

Ml A M2 : X ^ N 

X |Mi(x) - M2(x)| 

Thus, if an element of X has multiplicity mi in Mi and m2 in M2, then it has multiplicity 
|toi — TO2I in Ml A M2. 

Given a set X, we shall denote by M{X) the class of all finite multisets of elements of 
X. The mapping 

d:M{X) X MiX)^R 

(Mi,M2) H^. |Mi AM2I 

that associates to each pair of finite multisets the cardinal of their symmetric difference, is 
a metric on J\A{X), taking as notion of isomorphism the equality of multisets; this metric 
is called the symmetric difference metric on M.{X) (see, for instance, [9, p. 25] for the 
general version on a measure space) . Since the condition of being a metric is not affected 
by the multiplication by an scalar factor, is also a metric on M.{X). 

We shall use several times, usually without any further notice, the following easy result. 



Proposition 1. Let F : C M{X) be a mapping such that if A = B, then F{A) = F{B). 
Then, the mapping 

(A,B)^^\FiA)AFm 
is a metric on C if, and only if it satisfies the following condition: 
- IfF{A) = F{B), then A = B. 

Proof. Notice that dpi^A^B) = ^d{F{A), F{B)). Then, the non-negativity, symmetry and 
triangle inequahty axioms for dp are derived from the corresponding properties of ^d, 
without any further assumption. As far as the separation axiom goes, if A = B, then 
F{A) = F{B) and hence dF{A,B) — 0, also without any further assumption on F. The 
converse implication in the separation axiom says 

|F(^) A F{B)\ = implies A^B, 

and since (by the separation axiom for the symmetric difference metric) \F{A) AF{B)\ = 
is equivalent to F{A) = F{B), it is clear that this remaining condition is equivalent to the 
condition given in the statement. □ 



2.3 The /i-distance 

Let N = (y, E) be a phylogenctic network on S' = {1, . . . , n}. For every node v <= V and 
for every i = 1, . . . ,n, let mi{v) the number of different paths from v to the leaf i. The 
path-multiplicity vector, or ^-vector, for short, of v e F is 

lj,(v) = {mi{v), . . . ,m„(w)). 

The ii-representation of N is the multiset 

m(a^) = {Kv) \vev}, 

where every vector appears with multiplicity the number of nodes having it as their fj,- 
vector. 

The fi-distance between a pair of phylogenetic networks Ni and N2 on the same set S 
of taxa is 

d^{NuN2) = \\^x{N^)A^,{N2)\, 

where A denotes the symmetric difference of multisets. 

This /x-distance is known to be a metric on several well-defined classes of phylogenetic 
networks. More specifically, we have the following result (and then Proposition 1 applies). 

Theorem 1. Let Ni and N2 be two phylogenetic networks on the same set S of taxa. 
Assume that one of the following two conditions holds: 

(a) Ni and N2 are both tree-child, or 

(b) Ni and N2 are both semi-binary, time consistent and tree-sibling. 

Then, /i(A^i) = /u(A^2) implies Ni^N2. □ 
For a proof of the case (a), see [7], and for the case (b), see [3]. 



2.4 Moret-Nakhleh-Warnow-et ai's reduction process 

Let N = (y, E) be a phylogenetic network on a set S of taxa. Two nodes in N are said 
to be convergent when they have the same cluster. The removal of convergent sets is the 
basis of the following reduction procedure introduced in [11]. 

Let N = {V,E) be a phylogenetic network on S. If N does not contain any pair of 
convergent nodes (for instance, if it is a phylogenetic tree), then the reduction procedure 
does nothing. Otherwise: 

(0) For every clade T of N, with root Tt- 

— Add a new node hr between rr and its only parent. 

— Label hr with some symbol representing the clade T. 

— Remove rr and its descendants, so that hr becomes a leaf: we shall call it a 
symbolic leaf. 

After this step, the resulting multi-labeled DAG N* has two kinds of leaves: symbolic, 
which replace clades, and hybrid, which did not belong to any clade in N (the recon- 
structible phylogenetic networks considered in [11] could not contain hybrid leaves, but 
they can bo handled without any problem by the reduction procedure). 

(1) All internal nodes that are convergent in N with some other node are removed from 
N*, and all internal nodes of N* that are descendant of some removed node are also 
removed. 

(2) For every remaining node x in N* that was a parent of a node v that has been removed 
in (1), add a new arc from x to every (hybrid or symbolic) leaf that was a descendant 

of V in N* , if such an arc does not exist yet. 

The resulting DAG contains no set of convergent nodes, because any pair of convergent 
nodes in it would have already been convergent in N. 

(3) For every symbolic leaf Ht, unlabel it and append to it the corresponding clade T, 
with an arc from hx to rx- 

(4) Replace every node with only one parent and one child by an arc from its parent to 
its only child. 

Since the DAG resulting from (2) contains no pair of convergent nodes, it contains 
no node with only one child. Therefore the only possible nodes with only one parent 
and one child after step (3) are those that were symbolic leaves with only one parent. 
These are the only nodes that have to be removed in this step. 

Notice that the effect of (3) and (4) is not exactly the replacement of each symbolic 
leaf by the corresponding clade: the symbolic leaf hr survives after (4) if it has more than 
one incoming arc, and in this case the clade T is appended to hx, instead of replacing it. 

The output of this procedure applied to a phylogenetic network TV on 5 is a (non 
necessarily rooted) leaf-labeled DAG, called the reduced version of N and denoted by 
RiN). 

Exam,ple 1. Let us compute the reduced version of the phylogenetic network N represented 
in Fig. 1. The subtree rooted at d, with leaves 1 and 2, is a clade, and each one of the 
other leaves forms a clade by itself. The graph N* obtained after step (0) is depicted in 
Fig. 2.(0), where the symbolic leaves that replace clades are represented by dashed circles. 
The maximal sets of convergent nodes in N are 

{6,c},{AS,e,/i},{C,2},{i?,4}. 

So, in step (1) we remove the nodes b, c, e, h, A, B, C, D, as well as all intermediate nodes 
in paths from them to symbolic leaves: this also removes the nodes f,g. So, the only 



Fig. 1. The phylogenetic network A*' in Example 1. 



internal nodes that remain after after step (1) are r and a. This yields the graph depicted 
in Fig. 2.(1). 

In step (2), we add new arcs from r and a to the symbolic leaves that were descendant 
of removed descendants of them. This yields the graph in Fig. 2.(2). 

In step (3), we append again to each symbolic leaf the clade it represented, and we 
unlabel the symbolic leaves: see Fig. 2.(3). 

Finally, in step (4) the parents of the node d and of the leaves 4 and 6 are removed and 
replaced by arcs {a,d), (r, 4), and (r, 6), respectively. The parents of leaves 3 and 5 remain, 
and they are hybrid in the resulting reduced network R{N), which is given in Fig. 2.(4). 

A phylogenetic network N is reduced when R{N) ~ N. From the given description of 
reduction procedure, it is easy to deduce that a phylogenetic network is reduced if, and 
only if, every pair of convergent nodes in it consists of a hybrid node of out-degree 1 and 
with all its proper descendants of tree type (thus forming a clade), and its only child. In 
particular, if we impose that all hybrid nodes in a phylogenetic network have out-degree 
1, as it is done for instance in reconstructible networks in the sense of [11], then a reduced 
network cannot contain any hybrid node that is a proper descendant of another hybrid 
node. 

Two networks A^i and N2 are said to be indistinguishable when they have isomorphic 
reduced versions, that is, when R{Ni) = R{N2). Moret, Nakhleh, Warnow, et al argue in 
[11, p. 19] that for reconstructible phylogenetic networks this notion of indistinguishability 
(isomorphism after simplification) is more suitable than the existence of an isomorphism 
between the original networks. 

3 Nakhleh's 'metric' m 

In this section we describe the dissimilarity measure m introduced by Nakhleh in [12]. 
After recalling Nakhleh's definition, we provide an alternative definition, as the cardinal of 
the symmetric difference of certain representations of the networks, which allows simpler 
proofs of the new results reported in this paper. 

Nakhleh begins by defining the following equivalence of nodes in pairs of phylogenetic 
networks. 




Fig. 2. The resulting DAGs after tlie diflPerent steps in tlie reduction process applied to A''. 



Definition 1. Let Ni — {Vi,Ei) and N2 — {V2,E2) he a pair of phylogenetic networks 
(not necessarily different). Two nodes u Vi and v E V2 are equivalent, in symbols u = v, 
when: 

— u and V are both leaves labeled with the same taxon, or 

— for some fc ^ 1, node u has exactly k children Ui, . . . , Wfc, node v has exactly k children 
Wi, . . . , Ufe, and Ui = Vi for every i = 1, . . . , fc. 

For every node v in a phylogenetic network, let k{v) be the number of nodes in this network 
that are equivalent to it. 

Then, he defines his dissimilarity measure by comparing the cardinals of equivalence 
classes of nodes in pairs of phylogenetic networks. 

Definition 2. Let Ni and N2 be a pair of phylogenetic networks on the same set S of 
taxa, and let U{Ni) and U{N2) he maximal sets of non- equivalent nodes in them. For 
every vi e U{Ni), let 

^, , r nivi) if no node in U{N2) is equivalent to Vi 

1 max{0, — >i(v[)} if v[ G U{N2) is equivalent to vi 

The value 5{v2), for every V2 G U{N2), is defined in a similar way. 
Then, let 

vieU{Ni) V2&U{N2) 



To introduce our version of this metric, we define first a nested labeling of the nodes 
of a phylogenetic network. 

Definition 3. Let N = {V, E) he a phylogenetic network on a set S of taxa. The nested 
label i{v) of a node v of N is defined by induction on h{v) as follows: 

— If h{v) = 0, that is, if v is a leaf then £{v) is the singleton consisting of its label. 

— Ifh{v) — m > 0, then all its children vi, . . . ,Vk have height smaller than m, and hence 
they have been already labeled: then, £{v) is the multiset of their nested labels, 

e{v) = {e{vi),...,£{vk)}. 

Notice that the nested label of a node is, in general, a nested multiset (a multiset 

of multisets of multisets of. . . ), hence its name. Moreover, the height of a node u is the 
highest level of nesting of a leaf in £{u) minus 1, and the cluster of u consists of the taxa 
appearing in i{u). 

Example 2. Table 1 gives the nested labels of the nodes of the phylogenetic network de- 
picted in Fig. 1, sorted by their height. 



Table 1. Nested labels of the nodes of the phylogenetic network N in Fig. 1. 



1 


{1} 


2 


{2} 


3 


{3} 


4 


{4} 


5 


{5} 


6 


{6} 


d 


{{1},{2}} 


C 


{{3}} 


D 


{{5}} 


e 


{{{3}}, {{5}}} 


h 


{{{3}}, {{5}}} 


A 


{{{{3}}, {{5}}}} 


B 


{{{{3}}.{{r.}}}} 


f 


{{^} -{{{{■''}} ■{m)}} 


9 


{{{{{3}}, {{5}}}}, {6}} 


c 


{{{{{{3}}, {{5}}}}, {6}}, {{4}, {{{{3}}, {{5}}}}}} 


a 


{{{!}, {2}}, {{{{3}}, {{5}}}}} 


b 


{{{{{3}}, {{5}}}}, {{{{{{3}}, {{5}}}}, {6}}, {{4}, {{{{3}}, {{5}}}}}}} 


r 


{{{{!}, {2}}, {{{{3}}, {{5}}}}}, 

{{{{{3}}, {{5}}}}, {{{{{{3}}, {{5}}}}, {6}}, {{4}, {{{{3}}, {{5}}}}}}}} 



We shall say that a nested label C.{v) is contained in a nested label £{u), in symbols 
£{v) =4 i{u), when £{v) is the nested label of a descendant of u. Notice that the fact that 
i{v) is contained in i{u), does not imply that t; is a descendant of u: several instances 
of this fact can be detected in the network represented in Fig. 1. Notice moreover that 
i{v) G i{u) if, and only if, £{v) is the nested label of a child of u. 

Nakhleh's equivalence relation is easily characterized in terms of nested labels. 



Proposition 2. Let Ni = (Vi,Ei) and N2 = (1/2,-^2) be a pair of phylogenetic networks 
(not necessarily different) labeled in a set S. For every u G Vi and v G V2, u = v if, and 
only if, £{u) = £{v). 

Proof We prove the equivalence by induction on the height of one of the nodes, say u. 

If h{u) = 0, then it is a leaf, and £{u) is the one-element set consisting of its label. 
Thus, in this case, u = v ii, and only if, v is the leaf of N2 with the same label as u, and 
£{u) = £{v) if, and only if, v is the leaf of N2 with the same label as u, too. 

Consider now the case when h(u) = m > and assume that the thesis holds for all 
nodes u' £ Vi of height smaller than m. Let ui,. . . ,Uk be the children of u. Then: 

— u = V ii and only if v has exactly k children and they can be ordered Vi, . . . ,Vk in such 
a way that Ui = Vi for every i = 1, . . . ,k. 

— £{u) = £{v) if and only if v has exactly k children and the multiset of their nested labels 
is equal to the multiset of nested labels of wi, . . . , Ufe, which means that u's children 
can be ordered vi, . . . ,Vk in such a way that £{ui) = £{vi) for every i = 1, . . . ,k. 

Since, by induction, the children of u satisfy the thesis, it is clear that u = v\s equivalent 
to £{u) = £{v). □ 

Thus, we can rewrite Nakhleh's dissimilarity measure in terms of nested labels. 

Definition 4. For every S-DAG N, the nested labels representation of N is the m,ultiset 
T{N) of nested labels of its nodes (where each nested label appears with multiplicity the 
number of nodes having it). 

Proposition 3. For every pair Ni,N2 of phylogenetic networks over the same set S of 
taxa, 

m{m,N2) = ^\r{Ni)AT{N2)\, 
where A denotes the symmetric difference of multisets. 

Proof. Let A^i ,N2 be a pair of phylogenetic networks over the same set S of taxa, and 
U{Ni), U{N2) maximal sets of non-equivalent nodes in them. Since the equivalence of 

nodes is synonymous of having the same nested labels, it is clear that, for every i = 1,2, 
T{Ni) is the multiset consisting of k{v) copies of £{v), for each v £ U{Ni). Then: 

— If wi G U{Ni) is not equivalent to any node in U{N2), then £{vi) contributes k{vi) = 
Sivi) to |r(7Vi) Ar(iV2)|. 

— If W2 G U{N2) is not equivalent to any node in U{Ni), then £{v2) contributes k{v2) = 
S{V2) to |r(iVi) Ar(iV2)|. 

— If wi S U{Ni) is equivalent to V2 S U{N2), then £{vi) = £{v2) contributes 

\k{vi) - k{v2)\ = max{0,/t(wi) - k{v2)} -|- max{0, «;(t;2) - k{vi)} = S{vi) + S{v2) 
to \r{Ni)AT{N2)\. 
This implies that 

|r(7Vi)Ar(iV2)|= J2 E '^(^2), 

vieu{Ni) v2eu{N2) 
from where the equality m(A^i, 7V2) = k\T{Ni) A r{N2)\ follows. □ 

In the rest of this paper, we shall use the definition of m provided by the last proposi- 
tion. 

The value m{Ni,N2) can be computed in time polynomial in the sizes of the networks 
A''i, N2 by performing a simultaneous bottom- up traversal of the two networks [13, 14] 



4 The separation power of m 



4.1 m separates distinguishable networks 

Nakhleh proved in [12, Thm. 2] the following result. 

Proposition 4. Let R{Ni) and R{N2) be the reduced versions of two phylogenetic net- 
works on the same set S of taxa. Then, m{R{Ni),R{N2)) = if, and only if, R{Ni) = 
R{N2). □ 

In this subsection we extend this result by showing that m separates phylogenetic 
networks that are distinguishable up to reduction; a sketch of this proof can be found in 
our previous preprint [6] . Wc would like to recall here that this was the (unaccomplished: 
see [8]) goal of the error metric defined in [11]. 

Theorem 2. Let Ni and N2 be two phylogenetic networks on the set S of taxa. Ifm{Ni, N2) = 

0, then Ni and N2 are indistinguishable. 

Proof. Let A^i = {Vi,Ei) and N2 = {V2,E2) be two phylogenetic networks such that 
T{Ni) = T{N2). We shall prove that the reduction process of both networks modifies 
exactly in the same way their nested labels representations, and thus the reduced versions 
R{N-i) and R{N2) are also such that T{R{Ni)) = T{R{N2)). Then, by Proposition 4, the 
latter are isomorphic. 

To begin with, notice that two nodes are convergent when the sets of taxa appearing 
in their nested labels are the same (without taking into account nesting levels or mul- 
tiplicities). In particular, A^i and N2 have the same sets of nested labels of convergent 
nodes. 

Step (0) in the reduction process consists of replacing every clade by a symbolic leaf. 
This corresponds to remove the nested labels of the nodes belonging to clades (except their 
roots) and to replace, in all remaining nested labels, each nested label of a root of a clade 
by the label of the corresponding symbolic leaf. We must prove now that we can decide 
from the nested labels representations alone which are the nested labels of nodes of clades 
and of roots of clades. 

Since the clades of a phylogenetic network are subtrees, a node belonging to a clade is 
only equivalent to itself (if w is a node of a clade and ^{u) = £{v), then Cl{u) = Cl{v), 
but in this case, since v is the least common ancestor of Cl{v) in the clade it belongs, v 
must be a descendant of u, and since u and v have the same height — because they have 
the same nested label- they must be the same node). In particular, a node of a clade does 
not share its nested label with any other node. 

Then, the nested labels of nodes v £ Vi belonging to some clade of Ni {i = 1,2) 
are characterized by the following two properties: l{v) and each one of the nested labels 
contained in it appear with multiplicity 1 in T{Ni) = T{N2) (and in particular v and 
its descendants are characterized by their nested labels); and t{v) and each one of the 
nested labels contained in it belong at most to one nested label (this means that v and its 
descendants are tree nodes, and in particular that the rooted subnetwork generated by v 
is a tree consisting only of tree nodes from Ni). And therefore the roots of clades of Ni are 
the nodes v with nested label (.{v) maximal with these properties, and the nodes of the 
clade rooted at v are those nodes with nested labels contained in £(w). This shows that 
the nested labels of roots of clades and the nested labels of nodes belonging to clades in 
A''i are the same as in A^2- 



So, we remove the same nested labels in A^i and N2 and we replace the same nested 
labels by symbolic leaves. As a consequence, the networks resulting after this step have 
the same nested labels. 

In step (1), all internal nodes that are convergent with some other node are removed, 
and all nodes other than (symbolic or hybrid) leaves that are descendant of some removed 
node are also removed. So, in this step we remove the nested labels other than singletons 
of convergent nodes, and the nested labels other than singletons that are contained in 
a nested label of some convergent node (notice that if t{v) is not a singleton and it is 
contained in i{u) and u is convergent, then cither v is a descendant of u, and then it has 
to be removed, or it is equivalent to a descendant of m, and then it is convergent with this 
descendant and it has to be removed, too). This shows that the nested labels of the nodes 
removed in both networks are the same, and hence that the nested labels of the nodes that 
remain in both networks are also the same. 

In step (2), the paths from the remaining nodes to the labels are restored. It means 
to replace in each remaining nested label C.{x), each maximal nested label t{v) =<; tix) of 
a removed node v by the singletons {si}, {S2}, . . . , {sp} of the leaves appearing in ^{v). 
Again, this operation only depends on the nested labels, and therefore after this step the 
resulting DAGs have the same nested labels representations. 

In step (3), clades are restored. This is simply done by replacing in the nested labels 
each symbolic leaf s by the nested label of the root of the clade it replaced, between 
brackets (because we append it to the node corresponding to the symbolic leaf). Since the 
same clades were removed in both networks and replaced by the same symbolic leaves, 
after this step the resulting DAGs still have the same nested labels representations. 

Finally, in step (4), the nodes with only one parent and only one child arc removed. 
This corresponds to remove nested labels of the form {{. . .}} that are children of only one 
parent (that is, that belong to only one nested label), and replacing them in the nested 
labels containing them by the corresponding nested label {. . . } without the outer brackets. 
This shows that the same nested labels are removed in both DAGs and that the remaining 
nested labels are modified in exactly the same way. 

So, at the end of this procedure, the resulting DAGs R{Ni) and R{N2) have the same 
nested labels representations. By Proposition 4, this implies that R{Ni) and R{N2) are 
isomorphic. □ 

The converse implication is, of course false: since the reduction process may remove 
parts with different topologies that yield differences in the nested labels representations, 
two phylogenetic networks with isomorphic reduced versions may have different nested 
labels representations. 

4.2 m refines the /i-distance 

As a direct consequence of Proposition 4, Nakhleh deduced that m satisfies the separation 
axiom of metrics on the class of all reduced phylogenetic networks on the same set S of 
taxa. In this subsection wc show two other independent classes of phylogenetic networks 
where m satisfies this axiom. The key observation in our proofs is that m refines the 
/^-distance, in the sense of Proposition 5 below. 

Lemma 1. Let v be a node in a phylogenetic network N on a set S of taxa. For every 
i £ S, mi{v) is the number of times the label i appears in £{v). 



Proof. We prove it by induction on h{v). If h{v) = 0, then u is a leaf, and therefore 
mi{v) = 1 if £{v) = {i} and mi{v) = if i{v) = {j}, for some j G S \ {i}. 

Assume now that the statements is true for aU nodes of height at most m — 1, and let 
V he a node of height m. Let vi, . . . , Vk be the children of v, all of them of height lower 

than m. Then, on the one hand, ■mi{v) = mi{vi)-\ \-mi{vk) by [7, Lem. 4], and, on the 

other hand, since £{v) = {£(vi), . . . , £(vk)}, it is clear that the number of times the label i 
appears in £{v) is equal to the sum of the numbers of times it appears in the nested labels 
£{vi), . . . , £{vk), which is equal, by the induction hypothesis, to mi{vi) H h mi{vk)- □ 

Corollary 1. Let Ni = {Vi,Ei) and = iV^^E^) be phylogenetic networks on the same 
set S of taxa. For every vi G Vi and V2 € V2, if £{v\) = £{v2), then = l-i'{v2)- □ 

Proposition 5. Let Ni = {Vi,Ei) and N2 = {V2,E2) be two phylogenetic networks on 
the same set S of taxa. Then, m{Ni,N2) ^ dfj,{Ni,N2). 

Proof. Let us rename the nodes of Ni and A^2 as 

Vi = {V1,V2, . . . ,Vl,Vl+i, . . . ,Vm, ■ ■ ■ ,Vs}, V2 = {wi,W2, ■ . ■ ,'Wl,Wl+i, . . . ,Wm, ■ ■ ■ ,Wt}, 

with \Vi\ = s and IV2I = t and I ^ m ^ s,t, in such a way that: 

— for every i = 1, . . . ,1, £{vi) ~ £{wi) (and hence, by the last corollary, ii{vi) — iJ,{wi)), 
while, for every j = I + 1, . . . ,s and k = I + 1, . . . ,t, £{vj) ^ £{wk); 

— for every i = ? + 1, . . . , to, fi{vi) = li{wi), while, for every j = m + 1, . . . ,s and 
k = m+l,...,t, n{vj) ^n{wk). 

Therefore 



\r{N^) A r(Ar2)| = {s-l) + {t-l)^{s-m) + {t-m) = \ii{Nt) A tx{N2)\, 

as we claimed. □ 

Corollary 2. If d^ satisfies the separation axiom on a class of phylogenetic networks, m 
also satisfies it. 

Proof. Let TV^ be a class of phylogenetic networks such that d^{Ni,N2) = implies A^i = 
A^2 for every iVi,iV2 G Af. Let now Ni,N2 G Af he such that m{Ni,N2) = 0. Since 
to(7Vi, 7V2) ^ dij,{Ni,N2) > 0, we conclude that d^(A''i, iV2) = and hence, by assumption, 
A^i ^ N2. □ 

Combining this result with Theorem 1 we obtain the following result. 



Corollary 3. Let Ni and N2 be two phylogenetic networks on the same set S of taxa. 

Assume that one of the following two conditions holds: 

(a) Ni and N2 are both tree- child, or 

(b) N\ and N2 are both semi-binary, time consistent and tree-sibling. 

Then, T{Ni) = T{N2) implies Ni^N2. □ 



In particular, by Proposition 1, to is a metric on the classes of all tree-child and of all 
semi-binary, time consistent tree-sibling phylogenetic networks. 




Fig. 3. Two non-isomorphic reduced phylogenetic networks at /^-distance 0. 

Remark 1. It is important to point out that the /i-distance does not satisfy the separation 
axiom on the class of reduced phylogenetic networks: for instance, the reduced networks 
iVg and A^io in [8, Fig. 11], which we recall in Fig. 3, have the same /i-representations, but 
they are not isomorphic. Therefore, Nakhleh's m metric has a stronger separating power 
than the /^-distance, in the sense that it satisfies the separation axiom in every class where 
c?^ satisfies it, and in at least one class where c?^ does not satisfy it. 

Remark 2. It is false in general that if two arbitrary time consistent tree-sibling phyloge- 
netic networks A^i and iV2 on the same set S of taxa are such that m{Ni, N2) — 0, then 
Ni = N2. For instance, it is easy to check that the networks depicted in Fig. 4 have the 
same nested labels representations, but they are not isomorphic. Thus, Nakhleh's dissim- 
ilarity measure is not a metric on the class of all time consistent tree-sibling phylogenetic 
networks. 

4.3 m singles out MUL-trees 

The comparison of MUL-trees generalizes simultaneously the comparison of non-labeled 
rooted trees (understood as MUL-trees with all their leaves labeled with the same label) 
and of rooted phylogenetic trees (MUL-trees where each leaf has one label, and different 
leaves have different labels). Ganapathy et al have recently proposed in [10] two metrics 
for MUL-trees, an edition distance that generalizes the Robinson-Foulds distance for phy- 
logenetic trees, and a metric based on the computation of the multi-labeled analogous of a 
Maximum Agreement Subtree, n this subsection we show that the natural generalization 
of Nakhleh's m to MUL-trees is also a metric on the space of MUL-trees on a given set S 
of taxa. 

The definition of the nested labeling generalizes to MUL-trees in a natural way as 
follows. 

Definition 5. Let M — (V, E) be a MUL-tree on a set S of labels. The nested labeling 
£{v) of the nodes v of N is defined by induction on h{v) as follows: 

— If h{v) — 0, that is, if v is a leaf, and if the set of labels of v is Sy C S, then i{v) ~ Sy. 

— If h{v) ~ m > and if the children of v are vi, . . . ,Vk, then l{v) is the multiset of 
their nested labels: 

l{v)^{£{v^),...,l{vk)}. 




Fig. 4. These time consistent tree-sibling phylogenetic networks have the same nested labels 
representations, but they are not isomorphic 

The nested labels representation of M is the multiset T{M) of nested labels of its nodes 
(where each nested label appears with multiplicity the number of nodes having it). 

With this definition of nested labels, Nakhleh's dissimilarity measure m for MUL-trees 
is simply defined as in Proposition 3: half the cardinal of the symmetric difference of the 
nested labels representations. 

Notice now that, in a MUL-tree, the nested label of a node yields, after replacing 
brackets by parentheses, the Newick string of the subtree rooted at that node. In particular, 
if two nodes in two MUL-trees have the same nested labels, then the subtrees rooted at 
them are isomorphic. This remark lies at the basis of the following proof. 

Proposition 6. Let Mi and M2 be two MUL-sets on the same set S of labels. IfT{Mi) — 
TiAh), then Ah = A//2. □ 

Proof. Let T(Afi) and T{M2) be the nested labels representations of Mi and M2. If 
m{Mi,M2) = 0, then T{Mi) = T{M2). Now, if ri and are the roots of Mi and M2, 
respectively, then £{ri) and £(^2) are the nested labels with highest level of nesting in 
T{Mi) and T{M2), respectively, and then, these multisets being equal, it must happen 
that £{ri) = £{r2)- But then the subtrees of Mi and M2 rooted at ri and r2, that is, the 
MUL-trees Mi and M2 themselves, are isomorphic. □ 

In particular, m is a metric on the class of all MUL-trees labeled in a given set S. 
5 Conclusion 

In this paper we have complemented Luay Nakhleh's latest proposal of a metric m for 
phylogenetic networks by showing that it separates distinguishable networks, and that it 



satifies the separation axiom on the classes of tree-child and of quasi binary time consistent 
tree-sibling phylogenetic networks as well as of area cladograms. When m is applied to 
phylogenctic trees, it yields half the symmetric differences of the sets of (isomorphism 
classes of) subtrees [12], and it can be computed in time polynomial in the size of the 
networks. 

Given a set S* of n ^ 2 labels, there exists no upper bound for the values of m{Ni, N2), 
as there exist arbitrarily large phylogenetic networks with n leaves and no internal node 
of any one of them equivalent to an internal node of the other one. 
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